129 research outputs found

    Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

    Full text link
    Speaking rate refers to the average number of phonemes within some unit time, while the rhythmic patterns refer to duration distributions for realizations of different phonemes within different phonetic structures. Both are key components of prosody in speech, which is different for different speakers. Models like cycle-consistent adversarial network (Cycle-GAN) and variational auto-encoder (VAE) have been successfully applied to voice conversion tasks without parallel data. However, due to the neural network architectures and feature vectors chosen for these approaches, the length of the predicted utterance has to be fixed to that of the input utterance, which limits the flexibility in mimicking the speaking rates and rhythmic patterns for the target speaker. On the other hand, sequence-to-sequence learning model was used to remove the above length constraint, but parallel training data are needed. In this paper, we propose an approach utilizing sequence-to-sequence model trained with unsupervised Cycle-GAN to perform the transformation between the phoneme posteriorgram sequences for different speakers. In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data. Preliminary evaluation on two datasets showed very encouraging results.Comment: 8 pages, 6 figures, Submitted to SLT 201

    AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

    Full text link
    Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data. We obtain a subset of nearly clean speech from an audio-visual corpus using a neural quality estimator, and then train a diffusion model on this subset to generate waveforms conditioned on continuous speech representations from AV-HuBERT with noise-robust training. We use continuous rather than discrete representations to retain prosody and speaker information. With this vocoding task alone, the model can perform speech enhancement better than a masking-based baseline. We further fine-tune the diffusion model on clean/noisy utterance pairs to improve the performance. Our approach outperforms a masking-based baseline in terms of both automatic metrics and a human listening test and is close in quality to the target speech in the listening test. Audio samples can be found at https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html.Comment: Submitted to ICASSP 202

    Few-Shot Spoken Language Understanding via Joint Speech-Text Models

    Full text link
    Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks. By employing a pre-trained speech-text model, we find that models fine-tuned on text can be effectively transferred to speech testing data. With as little as 1 hour of labeled speech data, our proposed approach achieves comparable performance on spoken language understanding tasks (specifically, sentiment analysis and named entity recognition) when compared to previous methods using speech-only pre-trained models fine-tuned on 10 times more data. Beyond the proof-of-concept study, we also analyze the latent representations. We find that the bottom layers of speech-text models are largely task-agnostic and align speech and text representations into a shared space, while the top layers are more task-specific

    Toward Joint Language Modeling for Speech Units and Text

    Full text link
    Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform continuous speech signals into discrete units and use different methods to construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes speech and text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different modalities (speech or text) and test its performance to assess the model's learning of shared representations. Our results show that by mixing speech units and text with our proposed mixing techniques, the joint LM improves over a speech-only baseline on SLU tasks and shows zero-shot cross-modal transferability.Comment: EMNLP findings 202

    Proteomic analysis of rhein-induced cyt: ER stress mediates cell death in breast cancer cells

    Get PDF
    Rhein is a natural product purified from herbal plants such as Rheum palmatum, which has been shown to have anti-angiogenesis and anti-tumor metastasis properties. However, the biological effects of rhein on the behavior of breast cancers are not completely elucidated. To evaluate whether rhein might be useful in the treatment of breast cancer and its cytotoxic mechanism, we analyzed the impact of rhein treatment on differential protein expression as well as redox regulation in a non-invasive breast cancer cell line, MCF-7, and an invasive breast cancer cell line, MDA-MB-231, using lysine- and cysteine-labeling two-dimensional difference gel electrophoresis (2D-DIGE) combined with MALDI-TOF/TOF mass spectrometry. This proteomic study revealed that 73 proteins were significantly changed in protein expression; while 9 proteins were significantly altered in thiol reactivity in both MCF-7 and MDA-MB-231 cells. The results also demonstrated that rhein-induced cytotoxicity in breast cancer cells mostly involves dysregulation of cytoskeleton regulation, protein folding, the glycolysis pathway and transcription control. A further study also indicated that rhein promotes misfolding of cellular proteins as well as unbalancing of the cellular redox status leading to ER-stress. Our work shows that the current proteomic strategy offers a high-through-put platform to study the molecular mechanisms of rhein-induced cytotoxicity in breast cancer cells. The identified differentially expressed proteins might be further evaluated as potential targets in breast cancer therapy

    Elevated BCRP/ABCG2 Expression Confers Acquired Resistance to Gefitinib in Wild-Type EGFR-Expressing Cells

    Get PDF
    The sensitivity of non-small cell lung cancer (NSCLC) patients to EGFR tyrosine kinase inhibitors (TKIs) is strongly associated with activating EGFR mutations. Although not as sensitive as patients harboring these mutations, some patients with wild-type EGFR (wtEGFR) remain responsive to EGFR TKIs, suggesting that the existence of unexplored mechanisms renders most of wtEGFR-expressing cancer cells insensitive.Here, we show that acquired resistance of wtEGFR-expressing cancer cells to an EGFR TKI, gefitinib, is associated with elevated expression of breast cancer resistance protein (BCRP/ABCG2), which in turn leads to gefitinib efflux from cells. In addition, BCRP/ABCG2 expression correlates with poor response to gefitinib in both cancer cell lines and lung cancer patients with wtEGFR. Co-treatment with BCRP/ABCG2 inhibitors enhanced the anti-tumor activity of gefitinib.Thus, BCRP/ABCG2 expression may be a predictor for poor efficacy of gefitinib treatment, and targeting BCRP/ABCG2 may broaden the use of gefitinib in patients with wtEGFR

    Adjuvant chemotherapy and survival outcomes in rectal cancer patients with good response (ypT0-2N0) after neoadjuvant chemoradiotherapy and surgery: A retrospective nationwide analysis

    Get PDF
    BackgroundFor rectal cancer, it remains unclear how to incorporate tumor response to neoadjuvant chemoradiotherapy (nCRT) when deciding whether to give adjuvant chemotherapy. In this study, we aim to determinate the survival benefit of adjuvant chemotherapy for rectal cancer patients with good response (ypT0-2N0) after nCRT and surgery.MethodsThe study cohort included 720 rectal cancer patients who had good response (ypT0-2N0) after nCRT and surgery, who did or did not receive adjuvant chemotherapy between January 2007 and December 2017, from the Taiwan Cancer Registry and National Health Insurance Research database. The Kaplan–Meier method, log-rank tests, and Cox regression analysis were performed to investigate the effect of adjuvant chemotherapy on 5-year overall survival (OS) and disease-free survival (DFS).ResultsOf 720 patients, 368 (51.1%) received adjuvant chemotherapy and 352 (48.9%) did not. Patients who received adjuvant chemotherapy were more likely to be female, younger (≤ 65), with advanced clinical T (3-4)/N (1-2) classification and ypT2 classification. No significant difference in 5-year OS (p=0.681) or DFS (p=0.942) were observed by receipt of adjuvant chemotherapy or not. Multivariable analysis revealed adjuvant chemotherapy was not associated with better OS (adjusted hazard ratio [aHR], 1.03; 95% Confidence Interval [CI], 0.88-1.21) or DFS (aHR, 1.05; 95% CI, 0.89-1.24). Stratified analysis for OS and DFS found no significant protective effect in the use of adjuvant chemotherapy, even for those with advanced clinical T or N classification.ConclusionAdjuvant chemotherapy may be omitted in rectal cancer patients with good response (ypT0-2N0) after nCRT and surgery
    corecore